Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
نویسندگان
چکیده
Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. In order to preserve the privacy of the client in data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. We focus on an improved distortion process that tries to enhance the accuracy by selectively modifying the list of items. The normal distortion procedure does not provide the flexibility of tuning the probability parameters for balancing privacy and accuracy parameters, and each item's presence/absence is modified with an equal probability. In improved distortion technique, frequent one item-sets, and nonfrequent one item-sets are modified with a different probabilities controlled by two probability parameters fp, nfp respectively. The owner of the data has a flexibility to tune these two probability parameters (fp and nfp) based on his/her requirement for privacy and accuracy. The experiments conducted on real time datasets confirmed that there is a significant increase in the accuracy at a very marginal cost in privacy.
منابع مشابه
Privacy Preserving Frequent Itemset Mining by Reducing Sensitive Items Frequency using GA
Frequent Itemset mining extracts novel and useful knowledge from large repositories of data and this knowledge is useful for effective analysis and decision making in telecommunication networks, marketing, medical analysis, website linkages, financial transactions, advertising and other applications. The misuse of these techniques may lead to disclosure of sensitive information. Motivated by th...
متن کاملPrivacy Preserving Outsourcing for Frequent Itemset Mining
Cloud computing uses the paradigm of data mining-as-a-service. A company/store lacking in mining expertise can outsource its mining needs to a service provider (server). The item-set of the outsourced database are the private property of the data owner. To protect this corporate privacy, the data owner encrypts the data and sends to the server. Based on the mining queries sent from client side,...
متن کاملCandidate Pruning-Based Differentially Private Frequent Itemsets Mining
Frequent Itemsets Mining(FIM) is a typical data mining task and has gained much attention. Due to the consideration of individual privacy, various studies have been focusing on privacy-preserving FIM problems. Differential privacy has emerged as a promising scheme for protecting individual privacy in data mining against adversaries with arbitrary background knowledge. In this paper, we present ...
متن کاملAn Improved Approach to High Level Privacy Preserving Itemset Mining
Privacy preserving association rule mining has triggered the development of many privacy-preserving data mining techniques. A large fraction of them use randomized data distortion techniques to mask the data for preserving. This paper proposes a new transaction randomization method which is a combination of the fake transaction randomization method and a new per-transaction randomization method...
متن کاملAn Improved EMASK Algorithm for Privacy-Preserving Frequent Pattern Mining
As a novel research direction, privacy-preserving data mining (PPDM) has received a great deal of attentions from more and more researchers, and a large number of PPDM algorithms use randomization distortion techniques to mask the data for preserving the privacy of sensitive data. In reality, for PPDM in the data sets, which consist of terabytes or even petabytes of data, efficiency is a paramo...
متن کامل